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Abstract — This paper addresses the issue of efficient turbo 
packet combining techniques for coded transmission with a 
Chase-type automatic repeat request (ARQ) protocol operating 
over a multiple-input-multiple-output (MIMO) channel with 
intersymbol interference (ISI). First of all, we investigate the 
outage probability and the outage-based power loss of the MIMO- 
ISI ARQ channel when optimal maximum a posteriori (MAP) 
turbo packet combining is used at the receiver. We show that 
the ARQ delay (i.e., the maximum number of ARQ rounds) 
does not completely translate into a diversity gain. We then 
introduce two efficient turbo packet combining algorithms that 
are inspired by minimum mean square error (MMSE)-based 
turbo equalization techniques. Both schemes can be viewed as 
low-complexity versions of the optimal MAP turbo combiner. The 
first scheme is called signal-level turbo combining and performs 
packet combining and multiple transmission ISI cancellation 
jointly at the signal-level. The second scheme, called symbol-level 
turbo combining, allows ARQ rounds to be separately turbo 
equalized, while combining is performed at the filter output. We 
conduct a complexity analysis where we demonstrate that both 
algorithms have almost the same computational cost as the con- 
ventional log-likelihood ratio (LLR)-level combiner. Simulation 
results show that both proposed techniques outperform LLR-level 
combining, while for some representative MIMO configurations, 
signal-level combining has better ISI cancellation capability and 
achievable diversity order than that of symbol-level combining. 

Index Terms — Automatic repeat request (ARQ) mechanisms, 
multiple-input-multiple-output (MIMO), intersymbol interfer- 
ence (ISI), outage probability, turbo equalization, minimum mean 
square error (MMSE). 



I. Introduction 
A. Research Motivation 

HYBRID-AUTOMATIC repeat request (ARQ) protocols 
and multiple-input-multiple-output (MIMO) play a key 
role in the evolution of current wireless systems toward high 
data rate wireless broadband standards |[l]. While MIMO 
techniques allow the space and time diversities of the multi- 
antenna channel to be translated into diversity and/or multi- 
plexing gains 13, hybrid-ARQ mechanisms exploit the ARQ 
delay, i.e., the maximum number of ARQ transmission rounds. 
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to reduce the frame error rate (FER) and therefore increase the 
system throughput f3], f^. 

In the last few years, special interest has been paid to the 
joint design of the transmission combiner (also referred to 
as "packet combiner") and the signal processor (detection 
and/or equalization) receiver Combining schemes targeting 
a joint design approach were first proposed by Samra and 
Ding for single antenna systems operating over intersymbol 
interference (ISI) channels Q, 0, El, H, and are called 
transmission combining with integrated equalization (lEQ). 
In particular, it was shown in f8l| that, when concatenated 
with an outer code, lEQ performs better than the iterative 
combining scheme introduced by Doan and Narayanan |j9l- 
In iterative combining, multiple copies of the same packet 
are independently interleaved and combining is performed by 
iterating between multiple equalizers before channel decoding. 
The lEQ concept was then extended to MIMO systems with 
flat fading to jointly perform co-antenna interference (CAI) 
cancellation and transmission combining ifTOl . ifTTI . ifTZl . In 
parallel, several other MIMO ARQ architectures exploiting the 
high degree of freedom in the design of the MIMO ARQ trans- 
mitter were proposed (e.g. lH, figl, IH, QS], 113, IH, 
CH, EQI)- Turbo coded ARQ schemes with iterative minimum 
mean square error (MMSE) frequency domain equalization 
(FDE) for single carrier transmission over broadband channel 
were proposed for direct sequence code division multiple 
access (DS-CDMA) and MIMO systems in [21 1 and 123, E3]| . 
respectively. 

Recently, in a seminal paper by El Gamal et al. l24ll . 
the diversity-multiplexing ti-adeoff Q of the MIMO ARQ flat 
fading channel was characterized, and was referred to as 
diversity-multiplexing-delay tradeoff. The authors proved that 
the ARQ delay presents an important source of diversity even 
when the channel is constant over ARQ transmission rounds, a 
scenario referred to as long-term static channel. In particular, 
it was shown that operating over such a channel with a large 
ARQ delay results in a flat diversity-multiplexing tradeoff. 
This means that one can achieve full diversity and multiplexing 
gains if large ARQ windows are allowed. The diversity- 
multiplexing-delay tradeoff was then investigated in the case 
of delay-sensitive services and block-fading MIMO channels 
in l29l and l30l . respectively. 

B. In this Paper 

Motivated by the lEQ concept H) and the results in l24l . we 
investigate efficient lEQ-aided packet combining strategies for 

'a fundamental tool for the design of space-time coding/multiplexing 
architectures initially proposed by Zheng and Tse for flat fading li25J , and 
later extended to frequency selective fading 1261 , 1271 , 1281 . 
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coded transmission with hybrid-ARQ operating over MIMO- 
ISI channels. Our main objective is to reduce the number 
of ARQ rounds required to correctly decode a data packet 
while keeping the receiver complexity (computational load 
and memory requirements) affordable. In our design, packet 
combining is performed at each ARQ round by exchanging 
soft information in an iterative (turbo) fashion between the 
soft packet combiner and the soft-input-soft-output (SISO) 
decoder. We refer to this combining family as "turbo packet 
combining" . 

We focus on space-time bit-interleaved coded modulation 
(ST-BICM) transmitter schemes with Chase-type ARQ, i.e., 
the data packet is entirely retransmitted. The choice of ST- 
BICM is motivated by the simplicity of this coding scheme, 
and the efficiency of its iterative decoding (ID) receiver in 
achieving high diversity and coding gains over block-fading 
MIMO-ISI channels El, (H, ESl, El, El, ESI- Our work 
is still valid for other space-time codes (STCs). Note that 
some practical systems employ hybrid-ARQ with incremental 
redundancy (IR). In IR-type ARQ, retransmissions only carry 
portions of the data packet. It presents an efficient technique 
for increasing the system throughput while keeping the error 
performance acceptable. In this paper, we restrict our work to 
Chase-type ARQ. Turbo combining techniques for broadband 
MIMO transmission with IR-type ARQ are left for future 
investigations. 

First of all, we derive the optimal maximum a posteriori 
(MAP) turbo packet combining algorithm that makes use 
of all diversities available in the MIMO-ISI ARQ channel to 
perform transmission combining. The turbo packet combining 
strategies we introduce in this paper can be seen as low- 
complexity sub-optimal techniques of the MAP combining 
algorithm. An important ingredient in MAP turbo combining is 
an analogy between multiple transmissions and antennas, and 
which consists of considering ARQ rounds as virtual receive 
antennas. This allows the ARQ delay, i.e., maximum number 
of ARQ rounds, to be translated into receive diversity. We 
then analyze the outage performance of the MIMO-ISI ARQ 
channel. This analysis allows us to know how the ARQ delay 
influences the outage probability of the MIMO ARQ system. 
It also serves as a theoretical foundation for the turbo packet 
combiners we propose in this paper. We also investigate the 
outage-based power loss due to multiple transmission rounds. 
This analysis establishes that in the outage region of interest 
(corresponding to an outage between 10~^ and 10"'^) the 
power loss due to ARQ is below 0.25dB. 

The next step in our work corresponds to the derivation of 
two turbo packet combining strategies for the MIMO-ISI ARQ 
channel. Both techniques are inspired by the unconditional 
MMSE turbo equalization schemes of |34J and The 
first algorithm, named signal-level turbo packet combining, 
presents a low-complexity version of MAP turbo combining. 
It performs packet combining and equalization using signals 
from all transmission rounds. In contrast to what was initially 
stated in ||38l , we show that the computational complexity of 

^In this paper, optimality refers to the exploitation of delay, space, time, 
and multipath diversities of the MIMO-ISI ARQ channel to combine multiple 
transmissions. 



this scheme is less sensitive to the number of ARQ rounds. 
Moreover, we provide an optimized implementation where it 
is not necessary for the receiver to store all signal vectors 
and channel matrices. The second combining scheme, namely, 
symbol-level turbo combining, performs soft equalization sep- 
arately for each round, and combines multiple transmissions 
at the level of filter outputs. It has the same computational 
complexity and fewer memory requirements compared with 
the first scheme. We also show that receiver requirements 
(computational complexity and memory) of both turbo com- 
bining schemes are almost similar to those of conventional log- 
likelihood ratio (LLR)-level combining, where extrinsic LLRs 
corresponding to multiple transmissions are simply added 
together before SISO decoding. Finally, we provide numerical 
simulations for some MIMO configurations demonstrating the 
superior performance of the proposed algorithms compared 
with LLR-level combining, and the significant gains they offer 
with respect to both the outage probability and the matched 
filter bound (MFB). 

Throughout the paper, the following notation is used. Super- 
script ^ denotes transpose, and ^ denotes Hermitian transpose. 
E [.] is the mathematical expectation of the argument (.). When 
X is a square matrix, dct (X) denotes the determinant of X. 
For each complex vector x G C^, diag {x} is the N x N 
diagonal matrix whose diagonal entries are the elements of x. 
I^r is the N X N identity matrix, and Ojvxq denotes an all 
zero NxQ matrix, (g) is the Kronecker product, and j = 

The following sections of the paper are organized as follows. 
In Section |II1 we provide a description of the MIMO ARQ 
system model and introduce some assumptions considered 
in this paper. In Section |III1 we derive the structure of the 
optimal MAP turbo combining scheme, and analyze the outage 
probability and the outage-based power loss of the considered 
MIMO ARQ system. Section |IV] details the structure of the 
proposed combining schemes and discusses complexity issues. 
Numerical results are provided in Section [V] The paper is 
concluded in Section |Vl] 

II. System Model and Assumptions 

We consider a multi-antenna link operating over a frequency 
selective fading channel and using an ARQ protocol at the 
upper layer. The transmitter and the receiver are equipped 
with Nt transmit and Nu receive antennas, respectively. 
The MIMO-ISI channel is composed of L taps (index I — 
0, • • • , i — 1). Each data stream is encoded with the aid of 
a p-rate channel encoder, interleaved using a semi-random 
interleaver 11, then modulated and space-time multiplexed 
over the Nt transmit antennas. This presents a ST-BICM 
coding scheme. The mapping function that relates each set of 
M coded and interleaved bits 6i.f,i, • • • ,bM,t,i to a symbol 
st,i that belongs to the constellation set S is denoted (p : 
{0,lV S, where i = 1, • • • , TVt, and i = 0, • • • , T- 1 are 
the transmit antenna and the channel use indices, respectively, 
and M — log2 \S\. The Nt x T symbol matrix corresponding 
to the entire frame is denoted 

S^[so,--- ,ST-i]e5^->^^, (1) 

Si = [Si,^,--- ,SAr^,i]^ e (2) 
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Figure 1. ST-BICM diagram with ARQ and turbo packet combining: (a) transmitter, (b) receiver. 



is the vector of transmitted symbols at time instant i. The rate 
of this transmission scheme is therefore R = pMNt- When 
the transmitter receives a negative acknowledgment (NACK) 
message due to an erroneously decoded block, subsequent 
transmission rounds occur until the packet is correctly received 
or a preset maximum number of rounds, i.e., ARQ delay, 
K is reached. The round index is denoted k = I, - - ,K. 
Reception of a positive acknowledgment (ACK) indicates a 
successful decoding and the transmitter moves on to the next 
block message. We suppose that the signaling channel carrying 
the one bit ACK/NACK feedback message is error free. In 
addition, we assume perfect packet error detection (typically, 
using a cyclic redundancy check (CRC) code). Therefore, a de- 
coding failure corresponds to an erroneous decoding outcome 
after K rounds. We focus on Chase-type ARQ mechanisms, 
i.e., the symbol matrix S is completely retransmitted. Both 
puncturing and mapping diversity, i.e., optimization of the 
mapping function over transmission rounds, are not investi- 
gated in this paper, and are left for future contributions. We 
use a zero padding (ZP) sequence Qnt^l to prevent inter- 
block interference (IBI). The ST-BICM scheme with ARQ is 
depicted in Fig. [T] a. The MIMO-ISI channel is assumed to 
be quasi-static block fading, i.e., constant over a frame that 
spans T channel use and independently changes from round 
to round. This scenario corresponds to the so-called short-term 
static channel case where ARQ transmission rounds see differ- 
ent and independent channel realizations |24|. The long-term 
static channel corresponds to the case where the channel is 
constant over all rounds related to the transmission of the same 
information block, i.e., h|''^ = H; Vfc S {I,-- - ,K}. Note 
that in orthogonal frequency division multiplexing (OFDM) 
broadband wireless systems, the ARQ channel is rather short- 
term static because frequency hopping is used to mitigate 
ISI. While in time division multiplexing (TDM)-based sys- 
tems, the channel dynamic can be either short or long-term 
static depending on the Doppler spread. In addition, we 
suppose that the channel profile, i.e., number of paths and 
power distribution, is identical for at least K consecutive 
rounds. This is a reasonable assumption for slowly time- 
varying wireless fading channels because the channel profile 
dynamic is mainly related to the shadowing effect. At the 
fcth round, the channel impulse response is represented by the 



Ne. X Nt complex matrices Hq'^-', • • • , 11^""^-^ corresponding 
respectively to taps 0, . . . , L — 1, and whose entries are zero- 
mean circularly symmetric Gaussian li'^li ^ CM (0,af^, 



where /i^'^f^ denotes the (r, t)th element of matrix The 
total energy of taps / = 0,---,L— lis normalized to one, 
i.e., J2i'=o = 1- Therefore, the channel energy per receive 
antenna r = 1, • • • , is 



(fe) 



L-1Nt 
1=0 t=l 



\h 



(k) 

r,t,l 



Nt. 



(3) 



We suppose that no channel knowledge is available at the 
transmitter Equal power transmission turns out to be the best 
power allocation strategy. In addition, under the assumption 
of infinitely deep interleaving, and by normaUzing the symbol 
energy to one, we get 



I-Nt 



(4) 



At the fcth round, after down-conversion and sampling at the 
symbol rate, the baseband complex received signal on the 
rth antenna and at time instant i is 



(fc) 

yJ 



L-l Nt 



EEC^M-.+< 



(fc) 



(5) 



(=0 t=i 



where ni'^J is the noise on the rth antenna, and 



(k) A 



CAA(Ojv«xi,a'ljv«] 



III. Optimal Turbo Packet Combining and Outage 
Analysis 

In this section, we provide a brief description of the structure 
of the turbo packet combining concept we propose in this 
paper, and introduce the optimal MAP turbo combiner We 
also investigate the outage probability and the outage-based 
transmit power loss then provide a numerical analysis. 

A. General Architecture and Optimal Turbo Combining 

The turbo packet combining strategies we propose in this 
paper allow decoding of a data packet transmitted over multi- 
ple MIMO-ISI channels in an iterative (turbo) fashion through 



PUBLISHED IN IEEE TRANSACTIONS ON COMMUNICATIONS, DEC. 2009 (PRINT AVAILABLE @ IHTTP://DX.DOI.ORG/I0.II09/TCOMM.2009.I2.0803I8l 

Pr jy^'-') I b,nM = 1 ; H^'\ • • • , n'-^l^, apnori LLRsj 

Ct,».« = log jr. J-. f , (8) 

Pr {yW I = ; h[,'\ H^^li, aprzorz LLRsj 



E exp -^^||yW-HWsf+ ^ ^"U^.'.') C',.,..: 

sGS^ t . [ {m' ,t' ,i')^{m,t,i) 
Ct,»,n = log J (14) 



the exchange of extrinsic information between the soft packet 
combiner and the SISO decoder. The main difference with 
conventional LLR-based packet combining is that multiple 
transmissions are combined before the computation of the soft 
information using a SISO packet combiner, while in LLR-level 
combining the soft outputs of different ARQ rounds are simply 
added together before channel decoding. The general block 
diagram is depicted in Fig. [T] b. Let N denote the number 
of turbo iterations performed between the combiner and the 
decoder at the fcth round (index n = 1, • • • , N), and 



Now, let us focus on the optimal soft packet combiner 
that allows the exploitation of all diversities, i.e., space, time, 
multipath, and retransmission, present in the MlMO-lSl ARQ 
channel to iteratively compute extrinsic information about 
coded and interleaved bits. First, let us introduce 



(k) A 



(fc) (k) 

Via ' ' ' VnrA 



(7) 



(t>'t,i,n - ['^1,*,*,"' • • • ''^ 



M,t,i,n\ 



(^,^) e {!,■■■ ,Nt} X {0,- 



1} 

(6) 



denote the vectors of extrinsic log-likelihood ratio (LLR) 
values generated by the soft combiner at iteration n. 0f„ ^ ^ „ 
is the extrinsic information related to coded and interleaved bit 
bm,t.i at turbo iteration n. We similarly define a priori vectors 



available at the input of the soft combiner at iteration n. 
For the sake of notation simplicity, the round index is not 
used in LLRs. At the nth iteration of the kth round, the 
soft packet combiner makes use of the NtT a priori vec- 
tors 01 n' ' ■ ■ 1 'Pnt t~i n rcccivcd signals to combine 
transmissions corresponding to rounds I, - - , fc, and com- 
pute extrinsic vectors 0i q „, • • • , cp^^ t-i n- These extrinsic 
LLRs are de-interleaved and sent to the SISO decoder to com- 
pute a posteriori information about useful bits and extrinsic 
LLRs about coded bits. The generated extrinsic information 
is then interleaved and fed back to the soft combiner to 
serve as a priori information 0" q „^i, • • • , <p%^ ^-i n+i 
next iteration n + 1. Note that the feedback of a NACK 
message does not necessarily mean that all information bits 
are erroneous. Therefore, extrinsic information generated by 
the SISO decoder during the last iteration of ARQ round fc — 1 
can be used as a priori information at the first iteration of ARQ 
round fc. 



that groups the signals received at time instant i of the fcth 
round (|5]). We assume that the signals received at rounds 
1, • • • , fc (i.e., yo^\ • • • , yy^i) and their corresponding chan- 
nel responses (i.e., Hq^\--- ,H^*'2j^) are available at the 
receiver. Note that this assumption may present an important 
limiting factor (in addition to the computational complexity) 
for implementing the optimal turbo combiner, since all signals 
and channel responses have to be stored in the receiver. 
The low-complexity signal-level turbo combining strategy we 
introduce in Section |IV] relaxes this condition by using two 
recursions for keeping signals and channel matrices of previ- 
ous rounds. At the nth iteration of round fc, the optimal soft 
combiner computes extrinsic LLR about coded and interleaved 
bit bm,t,i according to the MAP criterion (O, where 



(ky 
-yo 



(9) 

Note that this vector representation is of a great importance 
because it allows us to view each transmission round as a 
source of an additional set of virtual Nn receive antennas. 
Therefore, ARQ diversity translates into space diversity (i.e., 
virtual receive antennas). The signal vector y'^'^' corresponding 
to the transmission of matrix S over fc MIMO-ISI channels can 
be expressed as. 



(10) 



Generally speaking, iterative processing at round k will help correct 
information bits erroneously decoded during round fc — 1, while the LLR 
values of other bits remain the same. 



where H^'^^ is a kN^T x NtT block Toeplitz matrix. 
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With respect to ( fTOl i. extrinsic LLR given by ([S]! can 
now be expressed according to (fl4] l. where 5^ ^ ^ = 

{s e 5^-^ I ^,-1 (sm) = fo} , = 0, 1. 

B. Outage Probability and Outage-Based Transmit Power 
Loss 

It is well known that for non-ergodic channels, i.e., block 
fading quasi-static channels, outage-probability Pout II39I , 
|;40], [41] is regarded as a meaningful tool for performance 
evaluation because it provides a lower bound on the block error 
rate (BLER) (42 p. 187]. The outage probability is defined as 
the probability that the mutual information, as a function of 
the channel realization and the average signal to noise ratio 
(SNR) 7 per receive antenna, is below the transmission rate 
R. Mutual information rates of quasi-static frequency selective 
fading MIMO channel have been investigated in ||43]| , ||44|| . 

1) Outage Probability : To derive the outage probability 
of the considered MIMO ARQ system, we use the renewal 
theory 145)1 which was first used by Zorzi and Rao to analyze 
the performance of ARQ protocols {4E\. Recently, it was also 
used by [47 j , [24] to evaluate the performance of ARQ systems 
operating over wireless flat fading channels. Let Ak denote the 
event that an ACK message is fed back at round k, and £k 
the event that the ARQ system is in outage at round k. Under 
the assumption of perfect packet error detection and error-free 
ACK/NACK feedback, and by applying the renewal theory, 
the outage probability for a given SNR 7 and target rate R is 
given as 



(15) 



Note that a Chase-type ARQ mechanism with an ARQ delay 
K can be viewed as a repetition coding scheme where K 
parallel sub-channels are used to transmit one symbol message 
p. 194]. Therefore, ( fTSl l can be expressed as 



^<fn*(7) = Pr{^/(s;y(^)|HW 



A, 



< i?, 



(16) 



The virtual KNj^ x Nt MIMO-ISI communication model at 
the A'th ARQ round is 





L-l 










= E 




Si-; + 






/=0 


_ H|-) _ 







and the mutual information / (s;y(^^ | H(^'',7) in ( fTSI l can 
therefore be expressed in the case of i.i.d circularly symmetric 
complex Gaussian channel inputs as in ||43l , i.e.. 



T-l 



j=0 



Nt 



(17) 



where ' is the discrete Fourrier transform (DFT) of the 
if th round KNr x Nt virtual MIMO-ISI channel at the ith 
frequency bin, i.e.. 



L-l 

E 

1=0 



H 



(1) 



H 



exp 



.27: 



(18) 



2) Outage-Based Transmit Power Loss: To compare the 
outage probability performance of different ARQ configura- 
tions that operate at the same rate R but use different ARQ 
delays, we consider a short-term power constraint scenario 
where the same power F is used for all transmission rounds, 
i.e., the fcth round transmit power is F^ = F Vfc. We evaluate 
the power loss incurred by multiple transmission rounds due to 
link outage. Note that system performance can be improved 
when a power control algorithm is jointly used with packet 
combining (typically, a long-term power constraint scenario), 
but this is beyond the scope of this paper. The average SNR 
present in the outage expression (fT6l l is therefore given as 



^Nt 



(19) 



Let p count the number of information blocks, q = 1, • • • ,p 
denote the block index, and Tq the number of rounds used for 
transmitting block q. Therefore, for a given ARQ delay K, 
average SNR 7, and rate R, the average transmit power is 



Vavg = lim 

p—^QO 



y-p r 



= E[r| /f,7,i?]r. 



(20) 



This indicates that an ARQ protocol with an ARQ delay K and 
operating with rate R at average SNR 7 incurs an outage-based 
transmit power loss of 10 Xog^fj {¥. \T \ K,^,R]) compared 
with an ARQ with K — 1 round (i.e., no retransmissions). 

C. Outage Analysis 

In the following subsection we investigate, using simu- 
lations, both the outage probability and the outage-based 
transmit power loss for some MIMO-ISI ARQ configurations. 
This will serve as a theoretical foundation for the perfor- 
mance evaluation of turbo packet combiners which we will 
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Figure 2. Outage probabilty versus the maximum number of rounds K for L = 2 taps, A*^^ = 2, and: (a) Nj- = 2, R = 2, (b) Nj^ = 4, i? = 4 



introduce in the next subsection. Let us consider a MIMO- 
ISI channel with L = 2 taps and equally distributed power, 
i.e., (Tq = (7^ = ^. We use Monte Carlo simulations to 



evaluate the outage probability ( fTSI l of the considered ARQ 
system. We choose T = 256 channel use. At each round k, a 



Nr X Nt MIMO-ISI channel h[,''^ and H^"' is generated, and 
the mutual achievable rate after k rounds is computed using 
(fTTI l. If the target rate R is not reached and k < K, the system 
moves on to the next round fc + 1. The ARQ process is stopped 
and another is started, either because of system outage (i.e., 
the achievable rate after K rounds is below R) or non-outage 
(i.e., the achievable rate is greater than R after round k < K). 

In Fig. |2] a, we plot the outage probability as a function 
of the ARQ delay K for the two path MIMO-ISI channel 
with two transmit and two receive antennas [Nt — — 2), 
and a target rate R = 2. The ARQ diversity gain, due to 
the short-term static channel dynamic, clearly appears when 
K = 2. For instance, a gain of approximately IdB is achieved 
at 5 * 10^"^ outage compared with the case of K = 1 (i.e., 
no ARQ). When K = 3, the outage probability performance 
is similar to that of K = 2. Fig. |2l b, shows the outage 
curves for Nt = 4 and Nji = 2 with a target rate i? = 4. 
We notice that as in the previous configuration, K — 2 and 
K = ?> have the same outage performance, while the overall 
diversity gain is more important than that corresponding to 
Nt — Nil — 2 (i.e., outage curve slopes are steeper than those 
of the first configuration). Note that the stacking procedure 
(|9]l relative to the optimal MAP-based turbo combiner creates 
kNu virtual receive antennas after k rounds, but not all these 
virtual antennas will translate into a receive diversity, because 
the target rate R has to be maintained as it can be seen from 
the expression of the achievable information rate in ( fTSI l. This 



r(fc) 



justifies the outage performance saturation after K = 2. This 
issue was recently addressed in [24j for MIMO ARQ with flat 
fading, and it was demonstrated that the diversity gain does 
not linearly increase with increase of the ARQ delay K. 

In Fig. |3] we present the outage-based transmit power loss 
for the considered MIMO configurations. We observe that in 
the region of low SNR, the outage-based loss is significant for 
both K ~2 and K — 3. When the outage probability is below 
< 10^^ (the region corresponding to FER values typically 
required in practical systems), the transmit power loss is below 
0.25dB. This indicates that in the corresponding SNR region, 
blocks are mainly error-free during the first transmission, and 
only a small number of frames require additional rounds. 

Motivated by these theoretical results, in the next section 
we design a class of reduced complexity MMSE-based turbo 
combiners. 



IV. Low Complexity MMSE-Based Turbo Packet 
Combining 

It is obvious that the complexity of the MAP turbo com- 
bining technique presented in Subsection IIII-AI is exponential 
in the number of transmit antennas and channel use. In 
this section, we introduce two low-complexity turbo packet 
combining techniques using the MMSE criterion, and analyze 
their computational cost and memory requirements. 

'^In f24' Theorem 2], the authors demonstrated that for the case of a short- 
term static flat fading MIMO ARQ channel, the optimal diversity gain is 
d*{re,K) = Kf{^) < re < mm{NT,Nii}, where is the 
multiplexing gain and / is the piecewise linear function connecting the points 
(x, (Nt - x) {Nn - x)) for x = 0, . . . , min {Nt, Nh}. 
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Figure 3. Outage-based transmit power loss for Nt = Nn = 2, R = 2, 
and Nt = 4:, Nr = 2, R = 4 



A. Signal-Level Turbo Combining 

Let us recall the MAP turbo combiner block communication 
model ( fTOb with a block length k — ki + K2 + 1-^T, where 
Ki and K2 are the lengths of the forward and backward fil- 
ters, respectively. The corresponding kNj^K x Nt {k + L — 1) 
sliding-window (around channel use i) communication model 
after k rounds is similar to ( fTOb . and is given as, 



HWs, + nW, 



where 



(1)' 



n 



(k) A 



n 



(1) 



, n 



are kN^n x 1 complex vectors. 



(1)' 



A r 



.T 

'l + Ki 



(22) 
(23) 

(24) 



and H 



(fc) 



•'kNnK.y.NTin+L- 



^' is defined similarly to ( fTTT i. 



To compute, at the nth iteration extrinsic information 
'Kn t i n about bit bm.t,i, using signals received during rounds 
I, - - , fc, we jointly (over all rounds) cancel soft ISI in a 
parallel interference cancellation (PIC) fashion. This yields a 

^feA^BK expressed as. 



soft ISI-free signal vector y^'^^ e 

=j|(t,n) 



y(^) ^ A y(fc) 



•H-(fc)£ 

= -i|(t,n) 



(25) 



where s^j^f „•) is the conditional average of symbol vector 
with zero at the {kiNt + t)th position, 

&l(,,„)^E[s, |C',t',^',„:(i',*')y^(t,*)]- (26) 

are then combined using an 

(fe) 



The components of y'*^-* 

=i\{t 

unconditional MMSE filter to produce the scalar input ^ ^ 
for the soft demapper. Applying the matrix inversion lemma 
||48| similarly to ||37l eq. 6], we can write the output of the 
unconditional MMSE filter as. 



dl = d^e7H('^)"AW-^y(;;) 



(27) 



where 



■^Nt{k+L-1)xNt(k+L~1) 



H„ = diag{a-^_„,-- - ,o-^^_„}, 



KiNr+t-l (K2+L)NT-t 



■^Nt{k+L~1) 



(29) 
(30) 

, (31) 



Clt^ = (l + (1 - <^ln) eJU^'^-Ai^r^U^'^^e,) " , (32) 

and CTj „ is the unconditional variance at iteration n of symbols 
{^i,i}i={) transmitted over antenna t. 



1=0 

st,i,7i = E [st 



(33) 

1,..-,M] (34) 



is the conditional average of symbol st.i at iteration n. 

Combining the soft PIC ( |25] ) and unconditional MMSE 
filtering ( |27] i steps, and after some matrix manipulations, we 



(21) can write the soft demapper input ^.^f^ as 



St,i^n — ^ t,n£Li ^t,n2.i\(t,n)- 



(35) 



Ff'^^ ™d b|*2 the forward and backward filters corre- 
sponding to antenna t at the nth iteration, 

F^l = (a^ + (1 - ejAi^^rC^^e,) " ej Ai^\ (36) 



'-'t,n — ^ t,n ■'- 



aIi \ z['^\ and T'^'^'We given as 



(37) 



,(0) 



A'T(K + i-l)xl' 



(38) 



(39) 



(40) 



NT{K + L-l)xNTiK + L-l)- 



H(fc) g cW„KxiVT(K+L-i) y(fe) jjjg ^iQ^j^ XoepHtz 
matrix and signal output of the sliding-window communication 
model at round k, respectively, and are given as. 



8 



PUBLISHED IN IEEE TRANSACTIONS ON COMMUNICATIONS, DEC. 2009 (PRINT AVAILABLE @ IHTTP://DX.DOI.ORG/I0.II09/TCOMM.2009.I2.0803I8l 



,„[sis] ses. 

T>m,t,t,n = log 



E 



cxp 



2(51' 



Jk) (k) 



(45) 



p[S»™i>] 



<fe) 



(47) 



H 



(fe) A 



H 



(fc) 



ffe) A 



(fe) 



H 



(fc) ... C^) 



yp) = HWs, + „f), 



H 



(fe) 

L-l 



(41) 

(42) 
(43) 



subsection, let ^^j^^^ denote the filter output at iteration n 

of round k, and (Cm!™ I ~ The soft 

demapper, which has a vector input in this case, computes 
extrinsic information 



[Symb] J . 

t>^tin according to (1471 ). where 



St,i,n St i,Afi 



^(fe-1) i(fe-) 



(48) 



(fe) A 



n 



iky 

i+Ki I 



(fe)^ 

n ■ 



(44) 



Recursions ( l39b and ( |40] | are easily obtained by invok- 
ing (l22l) and the general structure (fTTT i. Details about the 
derivation of ( |35] | are omitted because of space limitation. 
Assuming the conditional soft demapper input is Gaussian, 
i.e., (^^'j.'^l^ I st,'i^ ^ TV (^ajj'^i^t.n )> extrinsic information 
•^^m^t i n ''^'^ computed according to ( |45] |, where 



(fe) _ T,(fe) 



■'-'t.ri'^t 



= 1 



(46) 



and = {s € 5 | (^^^^(s) = f'}. The signal-level combining 
algorithm is summarized in Table U 

Note that the forward-backward filtering structure (l35T l 
together with recursions ( [39] l and (l40l l present the core part 
of the proposed algorithm, and allow a reduced computa- 
tional complexity and an optimized implementation. Indeed, 
equations ( [39] l and ( |40] | allow to use at each ARQ round all 
signals and channel matrices corresponding to previous rounds 
fc — 1 , • • • ,1 without being required to be explicitly stored in 
the receiver This is performed in a recursive fashion using 
modified versions of the sliding window input and matrix ( 



i.e., nC'^^yC^) and H^''^"h^''\ respectively) at round k 



B. Symbol-Level Turbo Combining 

In this combining scheme, we propose to perform equaliza- 
tion separately for each round k based on the communication 
model d43T l. Then, soft combining is conducted at the level of 
unconditional MMSE filter outputs: The output at iteration n 
of round k is combined with the outputs obtained at the last 
iteration of previous rounds fc — 1, • • • , 1. As in the previous 



^(1) ^(fe-1) ^(k) 



(49) 



and Aj'^ is the covariance matrix of | st^i^ which 

can be approximated as (assuming residual ISI plus noise 
terms at different rounds are independent), 

A^:2«diag{^^;^,...,^^-^)^e}• (50) 

The algorithm is summarized in Table |II] 

C. Complexity Analysis 

In this subsection, we focus on the analysis of the compu- 
tational cost of forward and backward filters as well as the 
memory requirements for the proposed algorithms. The other 
steps are similar and have the same complexity for both al- 
gorithms. We also provide comparisons with the conventional 
LLR-level combining technique. 

In the case of signal-level turbo combining, the computa- 
tion of forward and backward filters involves, at each round 
k and iteration n, one inversion of a Nt {k + L — 1) x 
Nxifi + L — l) matrix (i.e., matrix T^'^^ + cr^H,^^ in eq. 
(l38T l) for computing A^i'^-', and whose cost is O (^N^k^) 
(assuming k ^ L, and neglecting the cost of obtaining 
— <8) H^^ since H„ is diagonal). This indicates 

that the computational complexity of the signal-level com- 
bining scheme is less sensitive to k. The number of rounds 
only influences the number of additions required for obtaining 



vectors jz-'^' > and matrix T^'') , according to (39\) 

J 0<i<T-l 

and ( |40] |. respectively. The cost of these steps is 

^The forward and backward filters can be easily derived using the equations 
in the previous subsection and assuming fc = 1. 
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Table I 

Summary of the signal-level turbo packet combining algorithm 



0. Initialization 

Initialize T'"^ and Iz^'^^ f with 0]\irp{K+L-i) and vectors 0]Vy(K+L-i)xi. respectively. 

1. Combining at round k 

1.1. Update |z!'°'| and Y'''') according to ^ and 

1.2. For 71 = 1, ■ ■ -7^ 

1.2.1. Compute: conditional symbol averages and unconditional variances using l |34| l and i33l . 

1.2.2. Compute: aJ^' using ( IMl l. 

1.2.3. For t = !,■■■ ,Nt 

1.2.3.1. Compute: F<*2, B[% a[% and using (EUl, ODl, and ggj. 

1.2.3.2. For each i = 0, • • • , T — 1, compute the soft demapper input according to i35t . 

[Sig] ' ' J I 

1.2.3.3. For each m — 1, ■ ■ ■ , M, compute extrinsic information ^ ^ „ using l l45t . 

1.2.4. End 1.2.3. 

1.3. End 1.2. 



Table II 

Summary of the symbol-level turbo packet combining algorithm 



Initialization 

Initialize ^, j ^ , &t, and 5^ with empty vectors for t — 1, - ■ ■ , Nt- 

I ' ) i=0 
Combining at round k 

1.1. For n = 1, ■ • ■ ,iV 

1.1.1. Compute: conditional symbol averages and unconditional variances using ( |34| l and l |33l l. 

1.1.2. For t = l,--- ,Nt 

1.1.2.1. Compute: forward and backward filters, and 5\^^ as in Subsection IIV-AI 

1.1.2.2. For each i = 0, • ■ • , T — 1, compute the filter output Ct'i'n- 

1.1.2.3. For each m = 1, • • • , M, compute extrinsic information ^ j „ using ( 147b . 

1.2.3. End 1.1.2. 

1.2. End 1.1. 

1.3. Update: [^^^, ~ [^,,, |[':|^] &t ~ [at aj"^] , and ~ [s't Sl%'] for t = 1, ■ ■ ■ , Nt. 



ANAdd = {k + L - if + NrkT (51) 

for each round fc > 1. Note that the number of operations 
required for obtaining H^''' H^'') and H'^''' y^'') in not 
considered in ( fsTl i since symbol-level combining also involves 
the same operations. Therefore, the computational cost of 
forward and backward filters is almost the same for both 
combining algorithms. Note that the significant reduction in 
the complexity of the signal-level combining scheme (with 
respect to the dimensionality of the sliding-window model (I2TI 1 
used by the algorithm) is due to recursion (l40l l which consists 
of writing as the sum Et=i H^"'"h("'. 

Memory requirements for the two proposed schemes are 
determined by the update steps Tables J] 1.1 and [III 1.3. For 
the signal-level combining technique, a Nt {k + L — 1) x 
Nt {k + L — 1) complex matrix is required to accumulate 
channel matrices H'-'^-' H^'^-' according to (l40b (and there- 
fore generating T^'^'^), in addition to a Nt {k + L — 1) x T 
complex matrix that serves to accumulate signal vectors 

<z} ' > using ( [39l ). Note that these two recursions, i.e., 

I J i—Q 

( [39] l and ( |40] i. avoid the storage of all signals and channel 
matrices as in MAP turbo combining. In the case of symbol- 



level combining, only Nt complex matrices of size K x T 
and two K x Nt complex matrices are required to store 
filter outputs and their corresponding parameters, i.e., symbol 
gains and residual ISl plus thermal noise variances. Therefore, 
signal-level combining requires slightly more memory than 
its symbol-level counterpart, because only two or three ARQ 
rounds are considered (according to the outage analysis in 
Subsection IlII-CI l and in general L. 

Finally, note that in the case of conventional LLR-level 
combining, soft equalization is separately performed for each 
ARQ round exactly as in symbol-level combining, while 
extrinsic LLRs are added together before decoding. This trans- 
lates into NtMTN real additions at each round, and a real 
vector of size NtMT to combine extrinsic values. Therefore, 
the three combining strategies have similar implementation 
requirements. They slightly differ in the number of additions 
and storage memory. 

V. Numerical Results 

In this section, we provide simulated BLER and throughput 
performance for the proposed turbo packet combining tech- 
niques presented in Section |IV] Considering some representa- 
tive MIMO configurations, our main focus is to demonstrate 
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that the signal-level turbo combining approach has better ISI 
cancellation capability and diversity gain than the symbol-level 
approach. We also show that both techniques provide better 
performance than conventional LLR-level combining. 

A. Simulation Settings 

In all simulations, we use an ST-BICM scheme composed 
of a 64-state ^-rate convolutional code with polynomial gen- 
erators (1338, 1718)- The length of the code frame is 1800 bits 
including tail bits. We consider either quadrature phase shift 
keying (QPSK) or 16-state quadrature amplitude modulation 
(QAM) depending on the target rate R of the ST-BICM code. 
The MIMO-ISI channel has the same profile as in Subsection 
IIII-Cl i.e., two equal power taps. With respect to the outage 
analysis in Section |III] we consider a ARQ delay K ^ 2. 
We verified, with simulations, that for the considered ST- 
BICM code, the improvement in BLER performance is only 
incremental when K > 2. Note that in |38|, only a four- 
state code is used, and performance results are reported with 
a maximum number of rounds K — 3. Simulations are 
carried out as in Subsection IIII-CI i.e., the transmission of an 
information block is stopped and the system moves on to the 
next block when an ACK message is received or the decoding 
outcome is erroneous after round K ^ 2. 

Note that the benefits of an ARQ mechanism appear in the 
region of low to moderate SNR, where multiple transmissions 
are required to help correct packets erroneously received 
after the first round. For high SNR values, ARQ may not 
be needed because most packets are correct after the first 
transmission. Therefore, we focus our analysis on the SNR 
region where BLER values, after the first round, are between 
1 and 10~^. In this region, an ARQ protocol is essential to 
have reliable communication. Our main goal is to analyze the 
ISI cancellation capability and the achieved diversity order 
for the proposed turbo combining schemes. We, therefore, 
evaluate the BLER performance per ARQ round. We also 
evaluate the throughput improvement offered by the proposed 
schemes. The SNR appearing in all figures is per symbol 
per receive antenna. For both schemes, we consider five 
turbo iterations for decoding an information block at each 
transmission. We compare the resulting performance with the 
outage probability and the MFB. Note that for the purpose of 
fair comparison, the computation of the outage performance 
does not take into account the rate distortion as in ( fT6] l. 
The MFB curves are obtained for each transmission assuming 
perfect ISI cancellation and maximum ratio combining (MRC) 
of all time, space, multipath, and delay diversity branches. 

B. Analysis 

First we consider an ST-BICM code with Nt ~ 2 and 
QPSK signaling. This corresponds to a rate i? = 2. The 
number of receive antennas is iV^ = 2, and the filter 
length is K = 9 (ki = K2 = 4) for all combiners. Fig. |4] 
compares the BLER performance for the signal-level, symbol- 
level, and LLR-level combining with the MFB and the outage 
probability. For both signal and symbol-level turbo combining, 
the performance improvement after the second ARQ round 



N^=Nj,=2, CC(133j,171j), QPSK, L=2, K=2 
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Figure 4. BLER performance comparison for Nt = Nn = 2, 
CC(1338, 1718), QPSK, K = 2 rounds, and L = 2 taps. 



is very significant compared with LLR-level combining. The 
signal-level combining scheme is shown to achieve the MFB 
while the symbol-level scheme presents approximately a gap 
of IdB compared with the MFB. This means that signal- 
level combining has higher ISI cancellation capability than 
symbol-level combining. This result is due to the fact that 
in signal-level combining, each ARQ round is considered as 
a set of virtual iVj^ receive antennas. This allows the ARQ 
delay diversity to be efficiently exploited. On the other hand, 
both proposed schemes are shown to achieve the asymptotic 
slope of the outage probability. 

Now, we turn to ST-BICM codes with rate i? = 4. Firstly, 
we consider a configuration similar to that of the previous 
case but using 16-QAM modulation. The filter length is kept 
equal to k = 9. The BLER performance is reported in Fig. |5] 
In this scenario, the signal-level scheme clearly outperforms 
both the LLR-level and the symbol-level schemes. Indeed, the 
gap between the latter and the MFB is about 2.25dB. Both 
proposed techniques asymptotically achieve the diversity gain 
of the MIMO ARQ channel. In Fig. |6l we examine a ST- 
BICM code with Nt = 4, QPSK signaling, and Nr = 2. 
Note that this type of "unbalanced" configuration, i.e., more 
transmit than receive antennas, is suitable for the forward link. 
The filter length is increased to k = 13 {ki = K2 = 6) for 
all schemes. The signal-level combining technique is shown 
to achieve BLER performance close to the MFB (the gap is 
less than 0.5dB), while both the LLR-level and the symbol- 
level techniques have a degraded probability of error (the gap 
between the symbol-level and the MFB is more than 3dB at 
2 * lO'^BLER). It is also important to note that signal-level 
combining manifests itself in almost achieving the diversity 
gain while it is shown that symbol-level combining fails to 
do so. This is mainly due to the fact that, at the second 
ARQ round, the signal-level scheme constructs a 4 x 4 virtual 
MIMO-ISI channel for ISI cancellation and symbol detection, 
while the MIMO configuration remains unbalanced in the 
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Figure 5. BLER performance comparison for Nx = Nn = 2, 
CC(1338, ITlg), 16-QAM, K = 2 rounds, and L = 2 taps. 

case of symbol-level combining. In Fig. |7J we compare the 
throughput performance of the three algorithm for the 4x2 
configuration. It is shown that signal-level combining offers 
higher throughput. Also, note that while the MFB achieves 
the maximum throughput of 4bit/s/Hz, the proposed techniques 
saturate around 2bit/s/Hz because most of the packets received 
in the first ARQ round are erroneous. 

Finally, note that in practical systems, channel estimation 
presents the bottle-neck that causes performance loss. In 
||38l . we evaluated the BLER performance for a low -rate 
ST-BICM code (typically, Nt ^ Nr = 2, and R = 2) 
with imprecise channel estimates and using signal-level turbo 
packet combining. We have shown that when MMSE channel 
estimation is performed in a turbo fashion together with turbo 
packet combining (i.e., channel is iteratively re-estimated at 
each ARQ round using both pilot symbols and soft LLRs), the 
performance loss is less than 0.5dB when K = 2, and does 
not exceed IdB when the ARQ delay is increased to K — 3. 
Also, we have shown that even for the case of short-term static 
dynamic, turbo channel estimation can offer attractive BLER 
performance without requiring the re-transmission of the pilot 
sequence since channel estimation in subsequent ARQ rounds 
can rely only on soft LLRs. 

VI. Conclusion 

In this paper, we considered the design of efficient turbo 
packet combining schemes for MIMO ARQ protocols op- 
erating over frequency selective channels. First of all, we 
derived the structure of the optimal MAP packet combiner 
that exploits all the diversities available in the MIMO-ISI 
ARQ channel to perform transmission combining. Inspired 
by BTI . (I24l, we then investigated the outage probability 
and the outage-based power loss for Chase-type MIMO ARQ 
protocols operating over ISI channels. Then, we introduced 
two MMSE-based turbo combining schemes that exploit the 
delay diversity to perform transmission combining. The signal- 
level scheme considers an ARQ round as a set of virtual 



N^=4, Nj,=2, CC(133j,171j), QPSK, L=2, K=2 
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Figure 6. BLER performance comparison for Nt = 4, Nr = 2, 
CC(1338, 1718 ), QPSK, K = 2 rounds, and L = 2 taps. 



N^=4, Nj,=2, CC(1 33^,1 71 j), QPSK, L=2, K=2 




SNR (dB) 

Figure 7. Throughput performance compaiison for Nx = 4, Nn = 2, 
CC(1338, 1718), QPSK, K = 2 rounds, and L = 2 taps. 

receive antennas and performs packet combining jointly with 
ISI cancellation. The symbol-level scheme separately equalizes 
multiple transmissions, while combining is performed at the 
level of filter outputs. We showed that both combining schemes 
have computational complexities similar to that of the conven- 
tional LLR-level combining. Finally, we presented simulation 
results that demonstrated that signal-level combining provides 
better BLER and throughput performance than that of symbol- 
level and LLR-level combining. 
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